Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos.

Similar presentations


Presentation on theme: "An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos."— Presentation transcript:

1 An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos 2,4 1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy 2 Institute of Computer Science, ICS-FORTH, Greece 3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France 4 Department of Computer Science, University of Crete, Greece

2 June 2003Yannis Tzitzikas et al., EJC'20032 Outline of the presentation Introduction - Motivation Faceted Classification and Faceted Taxonomies –Advantages and Problems Compound Terms and Compound Taxonomies The Algebra –Operations –Examples –Algorithms –Deriving Navigational Trees –Prototype implementation Concluding Remarks

3 June 2003Yannis Tzitzikas et al., EJC'20033 Introduction Existing ways to locate information in the Web –searching (using search engines like Google) –browsing (using catalogues like Yahoo!, ODP) Currently, the catalogues are also exploited by the search engines: –for improving the measuring of relevance –for giving to the user a set of related pages to each page of the answer –for limiting the scope of the search Web Catalogues (or indices using controlled structured vocabularies): [-]: index only a subset of the pages that are indexed by search engines [+]: ensure indexing consistency [+]: enable intelligent reasoning [+]: enable browsing

4 June 2003Yannis Tzitzikas et al., EJC'20034 Drawbacks of the taxonomies that are used by Web Catalogues Hard to understand Laborious browsing Laborious object indexing Hard to update/revise Large storage requirements (1) Big size (e.g. currently Open Directory has 460.000 terms) (2) Inconsistent and incomplete terminology and structuring USER DESIGNER

5 June 2003Yannis Tzitzikas et al., EJC'20035 Faceted Classification and Faceted Taxonomies Faceted classification was developed, prior to the existence of computers,by S. R. Ranganathan (1892-1972), a Hindu mathematician working as a librarian. Key point: Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored) * A faceted taxonomy consists of a set of facets * Each facet is a group of elemental concepts * Each object is indexed by synthesizing elemental concepts Advantages of faceted taxonomies: they are easier to build and understand they require less storage space requirements they are more scalable

6 June 2003Yannis Tzitzikas et al., EJC'20036 Faceted Taxonomies Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus

7 June 2003Yannis Tzitzikas et al., EJC'20037 Example of using one taxonomy 1 billion pages blocks of 10 pages 100 million indexing terms Complete and balanced decimal tree Total: 111,111,111 terms

8 June 2003Yannis Tzitzikas et al., EJC'20038 Example of using a faceted taxonomy consisting of 4 facets 1 billion pages blocks of 10 pages 100 million indexing terms Total: 444 terms 100 terms xx x 400 terms

9 June 2003Yannis Tzitzikas et al., EJC'20039 Example of using a faceted taxonomy consisting of 8 facets 1 billion pages blocks of 10 pages 100 million indexing terms Total: 88 terms! …… 10 terms x … x 80 terms …

10 June 2003Yannis Tzitzikas et al., EJC'200310 Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus The Problem of Faceted Taxonomies Consequences : laborious/erroneous object indexing difficulties in browsing Invalid compound terms may appear during object indexing or browsing/retrieval A compound term is invalid if it cannot be applied to any object of the domain

11 June 2003Yannis Tzitzikas et al., EJC'200311 Valid and Invalid Compound Terms ValidInvalid Sports SeaSportsWinterSports Location Islands Mainland CretePilioOlympus F SeaSports.Olympus WinterSports.Islands WinterSports.Crete Invalid Compound Terms Sports.Location, Sports.Islands Sports.Crete Sports.Mainland Sports.Pilio Sports.Olymous SeaSports.Location, SeaSports.Islands SeaSports.Crete SeaSports.Mainland SeaSports.Pilio WinterSports.Location, WinterSports.Mainland WinterSports.Pilio WinterSports.Olympus Valid Compound Terms Example:

12 June 2003Yannis Tzitzikas et al., EJC'200312 The Idea Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms. Operations: unaryCombines terms from one facet plus negative modifiers self-minus-product unaryCombines terms from one facet plus positive modifiers self-plus-product unaryCombines terms from one facet self-product n-aryCombines terms from different facets plus negative modifiers minus-product n-aryCombines terms from different facets plus positive modifiers plus-product n-aryCombines terms from different facets product Initial Operands: Facet terminologies:

13 June 2003Yannis Tzitzikas et al., EJC'200313 Compound Terms and Compound Taxonomies Compound term: any subset s of T Compound terminology S : a set of compound terms Compound taxonomy: a pair (S,  ) where –S is a compound terminology and – {Sports,Crete}  {Sports}, {Sports,Crete}  {Sports,Greece} Sports Greece Crete Example:

14 June 2003Yannis Tzitzikas et al., EJC'200314 The Product Operation {Greece} {Islands} {Sports} {SeaSports}{Greece,Sports} {Islands,Sports}{Greece,SeaSorts} {Islands,SeaSorts} {Greece} {Islands} {Sports} {SeaSports} SS’

15 June 2003Yannis Tzitzikas et al., EJC'200315 The Plus-Product Operation {Greece} {Islands} {Sports} {SeaSports} SS’ {WinterSports} {SnowSki} {Greece} {Islands} {Sports} {SeaSports} {Greece,Sports} {Islands,Sports}{Greece,SeaSports} {Islands,SeaSports} {WinterSports} {Greece,WinterSports} {SnowSki} {Greece,SnowSki} P={{Islands,SeaSports}, {Greece,SnowSki}}

16 June 2003Yannis Tzitzikas et al., EJC'200316 The Minus-Product Operation {Greece} {Islands} {Sports} {SeaSports} SS’ {Greece} {Islands} {Sports} {SeaSports} {Greece,Sports} {Islands,Sports}{Greece,SeaSports} {Islands,SeaSports} {WinterSports} {SnowSki} {WinterSports} {Greece,WinterSports} {SnowSki} {Greece,SnowSki} N={{Islands, WinterSports}}

17 June 2003Yannis Tzitzikas et al., EJC'200317 The Self-[Plus/Minus]-Product Operations Self-Product Self-Plus-Product Self-Minus-Product

18 June 2003Yannis Tzitzikas et al., EJC'200318 The Self-Plus-Product: Example {Sports} {SeaSports} S {WinterSports} {SnowSki} P={{SeaSki,WindSurfing}, {SnowSki, SnowBoard}} {SeaSki}{Windsurfing}{SnowBoard} {Sports} {SeaSports}{WinterSports} {SnowSki}{SeaSki}{Windsurfing}{SnowBoard} {SeaSki,WindSurfing}{SnowSki,SnowBoard}

19 June 2003Yannis Tzitzikas et al., EJC'200319 Expressions and Well-formed Expressions An expression e is well-formed if: (a) each basic compound terminology appears at most once in e, (b) the parameters P/N are subsets of the corresponding genuine compound terms In this way: no conflicts arise monotonic behavior The set of expressions over a facet set {F 1,…, F k } is defined according to the grammar:

20 June 2003Yannis Tzitzikas et al., EJC'200320 Example: Building the catalog of a tourist portal Location Iraklion Outdoor AmmoudaraHersonissos Accommodation Furn. Appartments RoomsBungalows Facilities JacuzziSwimmingPool Indoor 3 facets, 13 terms, 890 compound terms from which only 96 are valid P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments}, {Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi}, {Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} } |P|=8 N = {{Iraklio, Bungalows}}, P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor}, {Ammoudara,Bungalows,Jacuzzi} } |P|+|N|=4

21 June 2003Yannis Tzitzikas et al., EJC'200321 Checking the Validity of a Compound Term We provide an algorithm for checking whether s  S e without having to compute (and store) the entire S e. The time complexity for this algorithm is: Let S e be the compound terminology defined by an algebraic expression e. => Only F and e have to be stored

22 June 2003Yannis Tzitzikas et al., EJC'200322 Generating Navigation Trees Objective: Given an expression e generate dynamically a navigation tree with nodes that correspond to valid compound terms only for using it during object indexing and browsing The navigation tree also contains nodes for facet crossing Sports SeaSports WinterSports byLocation Islands Mainland Crete Pilio Olympus byLocationMainland Pilio byLocation Islands Mainland Crete Olympus Pilio bySportsSeaSports WinterSports bySportsSeaSportsbyLocationCrete bySports SeaSports WinterSports Location

23 June 2003Yannis Tzitzikas et al., EJC'200323 Application in Web Catalogues big, incomplete, scalability problems Taxonomies of existing catalogs P|N Navigation Trees dynamically small, clear, scalable Faceted Taxonomies + Algebra

24 June 2003Yannis Tzitzikas et al., EJC'200324 Prototype Implementation using a RDBMS Three tables are used for storing the faceted taxonomy and the expression e. TERMS nameid SUBSUMPTION term1term2 PARAMETERS F1F2Fk... Expression Builder Storage Manager Validity Checker Nav. Tree Generator RDBMS DesignerIndexer/User Architecture

25 June 2003Yannis Tzitzikas et al., EJC'200325 Concluding Remarks Faceted Taxonomies : [+] conceptual clarity (it is easier to understand) [+] compactness (it takes less space) [+] scalability (the update operations can be formulated easier and be performed more efficiently) [-] invalid compound terms may appear. The Proposed Algebra : [+] provides a solution to the problem of invalid compound terms [+] Aids indexing and browsing (and prevents errors)


Download ppt "An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos."

Similar presentations


Ads by Google