An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos 2,4 1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy 2 Institute of Computer Science, ICS-FORTH, Greece 3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France 4 Department of Computer Science, University of Crete, Greece
June 2003Yannis Tzitzikas et al., EJC'20032 Outline of the presentation Introduction - Motivation Faceted Classification and Faceted Taxonomies –Advantages and Problems Compound Terms and Compound Taxonomies The Algebra –Operations –Examples –Algorithms –Deriving Navigational Trees –Prototype implementation Concluding Remarks
June 2003Yannis Tzitzikas et al., EJC'20033 Introduction Existing ways to locate information in the Web –searching (using search engines like Google) –browsing (using catalogues like Yahoo!, ODP) Currently, the catalogues are also exploited by the search engines: –for improving the measuring of relevance –for giving to the user a set of related pages to each page of the answer –for limiting the scope of the search Web Catalogues (or indices using controlled structured vocabularies): [-]: index only a subset of the pages that are indexed by search engines [+]: ensure indexing consistency [+]: enable intelligent reasoning [+]: enable browsing
June 2003Yannis Tzitzikas et al., EJC'20034 Drawbacks of the taxonomies that are used by Web Catalogues Hard to understand Laborious browsing Laborious object indexing Hard to update/revise Large storage requirements (1) Big size (e.g. currently Open Directory has terms) (2) Inconsistent and incomplete terminology and structuring USER DESIGNER
June 2003Yannis Tzitzikas et al., EJC'20035 Faceted Classification and Faceted Taxonomies Faceted classification was developed, prior to the existence of computers,by S. R. Ranganathan ( ), a Hindu mathematician working as a librarian. Key point: Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored) * A faceted taxonomy consists of a set of facets * Each facet is a group of elemental concepts * Each object is indexed by synthesizing elemental concepts Advantages of faceted taxonomies: they are easier to build and understand they require less storage space requirements they are more scalable
June 2003Yannis Tzitzikas et al., EJC'20036 Faceted Taxonomies Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus
June 2003Yannis Tzitzikas et al., EJC'20037 Example of using one taxonomy 1 billion pages blocks of 10 pages 100 million indexing terms Complete and balanced decimal tree Total: 111,111,111 terms
June 2003Yannis Tzitzikas et al., EJC'20038 Example of using a faceted taxonomy consisting of 4 facets 1 billion pages blocks of 10 pages 100 million indexing terms Total: 444 terms 100 terms xx x 400 terms
June 2003Yannis Tzitzikas et al., EJC'20039 Example of using a faceted taxonomy consisting of 8 facets 1 billion pages blocks of 10 pages 100 million indexing terms Total: 88 terms! …… 10 terms x … x 80 terms …
June 2003Yannis Tzitzikas et al., EJC' Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus The Problem of Faceted Taxonomies Consequences : laborious/erroneous object indexing difficulties in browsing Invalid compound terms may appear during object indexing or browsing/retrieval A compound term is invalid if it cannot be applied to any object of the domain
June 2003Yannis Tzitzikas et al., EJC' Valid and Invalid Compound Terms ValidInvalid Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus F SeaSports.Olympus WinterSports.Islands WinterSports.Crete Invalid Compound Terms Sports.Location, Sports.Islands Sports.Crete Sports.Mainland Sports.Pilio Sports.Olymous SeaSports.Location, SeaSports.Islands SeaSports.Crete SeaSports.Mainland SeaSports.Pilio WinterSports.Location, WinterSports.Mainland WinterSports.Pilio WinterSports.Olympus Valid Compound Terms Example:
June 2003Yannis Tzitzikas et al., EJC' The Idea Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms. Operations: unaryCombines terms from one facet plus negative modifiers self-minus-product unaryCombines terms from one facet plus positive modifiers self-plus-product unaryCombines terms from one facet self-product n-aryCombines terms from different facets plus negative modifiers minus-product n-aryCombines terms from different facets plus positive modifiers plus-product n-aryCombines terms from different facets product Initial Operands: Facet terminologies:
June 2003Yannis Tzitzikas et al., EJC' Compound Terms and Compound Taxonomies Compound term: any subset s of T Compound terminology S : a set of compound terms Compound taxonomy: a pair (S, ) where –S is a compound terminology and – {Sports,Crete} {Sports}, {Sports,Crete} {Sports,Greece} Sports Greece Crete Example:
June 2003Yannis Tzitzikas et al., EJC' The Product Operation {Greece} {Islands} {Sports} {SeaSports}{Greece,Sports} {Islands,Sports}{Greece,SeaSorts} {Islands,SeaSorts} {Greece} {Islands} {Sports} {SeaSports} SS’
June 2003Yannis Tzitzikas et al., EJC' The Plus-Product Operation {Greece} {Islands} {Sports} {SeaSports} SS’ {WinterSports} {SnowSki} {Greece} {Islands} {Sports} {SeaSports} {Greece,Sports} {Islands,Sports}{Greece,SeaSports} {Islands,SeaSports} {WinterSports} {Greece,WinterSports} {SnowSki} {Greece,SnowSki} P={{Islands,SeaSports}, {Greece,SnowSki}}
June 2003Yannis Tzitzikas et al., EJC' The Minus-Product Operation {Greece} {Islands} {Sports} {SeaSports} SS’ {Greece} {Islands} {Sports} {SeaSports} {Greece,Sports} {Islands,Sports}{Greece,SeaSports} {Islands,SeaSports} {WinterSports} {SnowSki} {WinterSports} {Greece,WinterSports} {SnowSki} {Greece,SnowSki} N={{Islands, WinterSports}}
June 2003Yannis Tzitzikas et al., EJC' The Self-[Plus/Minus]-Product Operations Self-Product Self-Plus-Product Self-Minus-Product
June 2003Yannis Tzitzikas et al., EJC' The Self-Plus-Product: Example {Sports} {SeaSports} S {WinterSports} {SnowSki} P={{SeaSki,WindSurfing}, {SnowSki, SnowBoard}} {SeaSki}{Windsurfing}{SnowBoard} {Sports} {SeaSports}{WinterSports} {SnowSki}{SeaSki}{Windsurfing}{SnowBoard} {SeaSki,WindSurfing}{SnowSki,SnowBoard}
June 2003Yannis Tzitzikas et al., EJC' Expressions and Well-formed Expressions An expression e is well-formed if: (a) each basic compound terminology appears at most once in e, (b) the parameters P/N are subsets of the corresponding genuine compound terms In this way: no conflicts arise monotonic behavior The set of expressions over a facet set {F 1,…, F k } is defined according to the grammar:
June 2003Yannis Tzitzikas et al., EJC' Example: Building the catalog of a tourist portal Location Iraklion Outdoor AmmoudaraHersonissos Accommodation Furn. Appartments RoomsBungalows Facilities JacuzziSwimmingPool Indoor 3 facets, 13 terms, 890 compound terms from which only 96 are valid P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments}, {Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi}, {Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} } |P|=8 N = {{Iraklio, Bungalows}}, P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor}, {Ammoudara,Bungalows,Jacuzzi} } |P|+|N|=4
June 2003Yannis Tzitzikas et al., EJC' Checking the Validity of a Compound Term We provide an algorithm for checking whether s S e without having to compute (and store) the entire S e. The time complexity for this algorithm is: Let S e be the compound terminology defined by an algebraic expression e. => Only F and e have to be stored
June 2003Yannis Tzitzikas et al., EJC' Generating Navigation Trees Objective: Given an expression e generate dynamically a navigation tree with nodes that correspond to valid compound terms only for using it during object indexing and browsing The navigation tree also contains nodes for facet crossing Sports SeaSports WinterSports byLocation Islands Mainland Crete Pilio Olympus byLocationMainland Pilio byLocation Islands Mainland Crete Olympus Pilio bySportsSeaSports WinterSports bySportsSeaSportsbyLocationCrete bySports SeaSports WinterSports Location
June 2003Yannis Tzitzikas et al., EJC' Application in Web Catalogues big, incomplete, scalability problems Taxonomies of existing catalogs P|N Navigation Trees dynamically small, clear, scalable Faceted Taxonomies + Algebra
June 2003Yannis Tzitzikas et al., EJC' Prototype Implementation using a RDBMS Three tables are used for storing the faceted taxonomy and the expression e. TERMS nameid SUBSUMPTION term1term2 PARAMETERS F1F2Fk... Expression Builder Storage Manager Validity Checker Nav. Tree Generator RDBMS DesignerIndexer/User Architecture
June 2003Yannis Tzitzikas et al., EJC' Concluding Remarks Faceted Taxonomies : [+] conceptual clarity (it is easier to understand) [+] compactness (it takes less space) [+] scalability (the update operations can be formulated easier and be performed more efficiently) [-] invalid compound terms may appear. The Proposed Algebra : [+] provides a solution to the problem of invalid compound terms [+] Aids indexing and browsing (and prevents errors)