Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meaningful Labeling of Integrated Query Interfaces

Similar presentations


Presentation on theme: "Meaningful Labeling of Integrated Query Interfaces"— Presentation transcript:

1 Meaningful Labeling of Integrated Query Interfaces
Eduard C. Dragut (speaker) Clement Yu Weiyi Meng University of Illinois at Chicago SUNY at Binghamton VLDB 2006, Seoul, Korea

2 A Motivating Scenario Looking for a ticket
Chicago – Seoul, September 10th – September 17th delta.com orbitz.com expedia.com A user looking for the “best” price for a ticket: Has to explore multiple sources It is tedious, frustrating and time-consuming E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

3 Unified query interface
The goal Provide a unified way to query multiple sources in the same domain The Web Unified query interface Airfare.com priceline.com Formulate the query united.com delta.com nwa.com E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

4 Overview Integrating Query Interfaces
Auto Cluster query interfaces Peng04 Extract query interfaces He05, Zhang04 Various formats e.g. ASCII files Car Rental Match query interfaces B.He03, Dhamankar04, Doan02, Madhavan05, Wu04 Books Airfare (Deep) Web Integration of Interfaces H.He03, Dragut 06 E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

5 Overview Integrating Query Interfaces
Integration Steps: Structural merging of query interfaces [He03 et al, Dragut06 et al] Grouping constraints Ancestor-Descendant relationships Determining the domain of each global field in the integrated interface [He03 et al] Meaningful labeling of the integrated interface The topic of this presentation E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

6 Motivation of Naming A query interface needs to be easily understood by any user, irrespective of his/her background The study of query interfaces in the seven domains used in our experiment revealed that the designers of query interfaces follow some “hidden” norms: there are certain relationships between the labels of the fields in the same groups E.g., all plurals the labels of the (super) groups semantically characterize the set of fields underneath them The semantic ambiguity problem Synonyms and homonyms are the two sources of naming conflicts [Batini86 et al, Bright94 et al] E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

7 The objectives The main goal is to provide a systematic way to label fields in the integrated query interface so that the concepts on the integrated query interface are easily understood by ordinary users. Validated undergoing a survey Provide a set of desirable properties required in order to have consistent labels for the attributes within an integrated interface so that users have no difficulty in understanding it. Not covered in detail E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

8 Naming Algorithm The input
A set of query interfaces in the same domain E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity Each query interface is represented hierarchically [Wu04] The mapping between the fields of the query interfaces. Organized in clusters (e.g. [Wu04 et al, B.He03 et al]) The set of groups of fields given by the merge algorithm [Dragut06 et al] The integrated query interface given by the merge algorithm as a schema tree [Dragut06 et al] vacations.net E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

9 An Example of Input Three fragments of query interfaces represented hierarchically The mapping between them, i.e. the set of clusters c_DepCity c_DestCity c_DepMonth c_DepDay c_DepTime c_DepYear (Travel,3) (Travel,4) (Travel,7) (Travel,6) (Travel,8) (Travel,null) (PriceLine,2) (PriceLine,3) (PriceLine,5) (PriceLine,6) (PriceLine,null) (PriceLine,7) (British,2) (British,3) (British,9) (British,8) (British,null) c_Aduts c_Infants c_Children c_Seniors c_Airlines c_Class (Travel,14) (Travel,15) (Travel,16) (Travel,12) (PriceLine,12) (PriceLine,14) (PriceLine,13) (British,5) (British,6) (British,13) E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

10 Naming Algorithm - Sketch
Step 1: Consistent labeling of the fields Fields in the same group - use intersect-and-union strategy Isolated fields, no consistency required Root fields - treated as a group Output: each group of fields (or field) has a set of candidate labels, possibly empty Step 2: Consistent labeling of the internal nodes For each internal node, starting from the lowest level to the root, apply a set of inference rules on labels Output: each internal node has a set of candidate labels, possibly empty Step 3: Enforce consistency within the entire integrated interface Not covered E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

11 Preliminaries Normalization [e.g., He03 et al, Madhavan01 et al , Rahm01 et al] E.g. Adults (18-64) becomes adult Semantic relationships among complex labels need to be established E.g., synonymy, hypernymy/ hyponymy Main issues Thesauruses provide semantic relationships only for individual content words (e.g., WordNet [Fellbaum98]) How to show that Area of Study is a synonym of Field of Work in the Job domain? How to show that Class is a hypernym of Class of Tickets in the Airline domain? E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

12 Preliminaries Manipulation of labels
A label is seen as a set of normalized content words E.g., {area, study} corresponds to Area of Study E.g., {field, work} corresponds to Field of Work Area of Study is a synonym of Field of Work Area is synonym of Field (by WordNet) Study is synonym of Work (by WordNet) Most descriptive vs. most general labels e.g. Category, Job Category, Area of Work, Function Category and Function – too general Job Category and Area of Work – descriptive, avoids confusion E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

13 Consistent Labeling of Groups of Fields
Assumption: The labels given by a query interface for the fields in the same group are consistent Organize the labels of a group in a relation-like form, called group relation General idea to build a consistent solution: Combine multiple rows of consistent labels until a label is assigned to each field in the group Cluster/schema c_Senior c_Adult c_Child c_Infant aa Adults Children airfareplanet Adult Child Infant Airtravel British Seniors Economytravel Infants E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

14 Consistent Labeling of Groups of Fields
Levels of Consistency String Level Two distinct tuples belong to this level of consistency if they have the same label for a cluster in the group relation Equality Level Two distinct tuples belong to this level of consistency if they have equal labels for a cluster in the group relation Synonymy Level Two distinct tuples belong to this level of consistency if they have synonym labels for a cluster in the group relation Cluster/schema c_NumConnections c_TicketClass c_Airline aa NonStop Choose an Airline airfare Number of Connections Airline Preference alldest Class of Ticket Preferred Airline cheap Max Number of Stops msn Class Airline E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

15 Consistent Labeling of Internal Nodes
The problem Given an internal node in the integrated interface, determine a label that is semantically suitable for it, i.e. its semantic is rich enough to cover the semantics of all its descendant leaf nodes An example a fragment of the integrated interface of real Estate domain E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

16 Consistent Labeling of Internal Nodes
In assigning labels to internal nodes we mainly exploit two types of knowledge: The semantic relationship among the labels of the internal nodes in the individual schema trees The relationship between internal nodes of source schema trees with overlapping sets of descendent leaves The two types of knowledge are employed to derive a set of logical inference rules among the textual labels Some of them will be exemplified next E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

17 Consistent Labeling of Internal Nodes
First logical inference Informally, consider two internal nodes v1 and v2 of two distinct source schema trees with the property that: v1’s set of descendant leaves is a subset of v2’s set of descendant leaves nodes, and v1’s label is a hypernym of v2’s label Then the labels of the two nodes are semantically equivalent within the given domain of discourse An example: E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

18 Consistent Labeling of Internal Nodes
Second logical inference (the idea): The same label is assigned to internal nodes in multiple source query interfaces and the descendant leaves of each such internal node are among those of the internal node in the integrated interface for which a label is sought. An example: Fragment integrated query interface Within source query interfaces E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

19 Consistent Labeling of Internal Nodes
Third logical inference (hypernymy scenario) Informally, consider two internal nodes v1 and v2 of two distinct source schema trees with the property that: v1’s label is a hypernym of v2’s label Then v1’s label semantically covers the union of the descendant nodes of the two nodes. An example: Fragment integrated query interface Within source query interfaces E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

20 Where can the instances help?
Discard labels as values The problem is known as schema element name as value [Xu03, Dhamankar04] Example, in the Book domain labels like Hardcover or Paperback are data instances of fields with labels like Format or Binding Reconcile most general vs. most descriptive The idea is to bound the meaning of the most general label to a more descriptive one E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

21 Experiment Setup Seven real world domain:
# interfaces Avg. # fields per interface Avg. # internal nodes per interface Avg. depth of interfaces Airfare 20 10.7 5.1 3.6 Automobile 1.7 2.4 Book 5.4 1.3 2.3 Job 4.6 1.1 2.1 Real Estate 6.5 2.7 Car Rentals 10.4 2.5 Hotels 30 7.6 Used also in Wu04 et al, Madhavan05 et al, Dragut06 at al E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

22 Human Acceptance Ignoring Inherited Ambiguity
Experiment Human Acceptance Questions asked: Do you have any difficulty in filling in an entry for each field? If you do, identify the fields you had difficulty filling in. Are the fields understandable on the source interfaces? 11 Survey respondents reported the following: Domain Labeling Quality Human Acceptance Human Acceptance Ignoring Inherited Ambiguity Airfare 53.0% 96.6% 98.3% Automobile 79.7% 100.0% Book 83.3% 98.9% Job 80.0% Real Estate 79.1% 97.8% Car Rentals 52.5% 97.9% 98.2% Hotels 70.1% 95.3% 96.1% E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

23 Example Integrated Interfaces
Airfare domain integrated interface The source query interface Four people found the group confusing E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

24 Example Integrated Interfaces
Auto domain integrated interface No surveyed person has identified any problem for this integrated query interface E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces

25 Thank you for your time and patience!
End Please visit the project web site Thank you for your time and patience! E. Dragut et al - Meaningful Labeling of Integrated Query Interfaces


Download ppt "Meaningful Labeling of Integrated Query Interfaces"

Similar presentations


Ads by Google