Presentation is loading. Please wait.

Presentation is loading. Please wait.

Filtering Properties of Entities By Class

Similar presentations


Presentation on theme: "Filtering Properties of Entities By Class"— Presentation transcript:

1 Filtering Properties of Entities By Class
Xiangqian Lee,Yuzhong Qu,Gong Cheng Websoft Research Group

2 Contents 1 Introduction 2 Problem Definition 3 A Primitive Approach
4 Improvements 5 Evaluation 6 Conclusion & Future Work

3 1 Introduction 1.1 Motivation

4 1 Introduction/1.1 Motivation
RDF data graph Characteristics: 1)nature of interconnection, 2)ability to reuse Traditional Document-centric patterns doesn’t fit well Challenges exists A way users are used to read and understand Take full advantages of the characteristics of data

5 1 Introduction/1.1 Motivation
Provide filtering properties Provide organization and navigation Filtering by human friendly labels We call it Perspective View the current entity in different perspectives Our Work: An automated approach Utilized for entity browsing Using Class hierarchies Adopting An Empirical Predictions to improve performance

6 2 Problem Definition Input: {E,Ftriples,Btriples}
Output:{Persps, PLabels} E: set of co-referenced entity URIs FTriples: triples whose subject is element of E. Btriples: triples whose object is element of E. PLabels: one to one relationship to each element in Persps.

7 2 Problem Definition Goal: The number of Perspectives is proper.
The label of each Perspective is human understandable. The contents under each Perspective are related to the label of it.

8 3 A Primitive Approach 3.1 Overview
3.2 Generate Primitive Perspectives 3.3 Integrating By Class Hierarchy 3.4 Labeling Perspectives

9 3 A Primitive Approach/3.1 Overview

10 3 A Primitive Approach/3.2 Generating Primitive Perspectives
Process: Triple in FTriplesdomain class; triple in BTriplesrange class Equaivalent relation: The properties of two triples’s domain or range is the same class. Persps: get a quotient set treating the above relation as the equivalent relation. Add all explicitly stated types of any element in E to Perspectives. Multi-domain or multi-range: add triple to all corresponding Perspectives.

11 3 A Primitive Approach/3.2 Integrating By Class Hierarchy
Fact: We have: <Shanghai dbpedia:areaTotal >. < dbpedia:areaTotal rdfs:domain dbpedia:place>. < dbpedia:populatedPlace rdfs:subClassOf dbpedia:place >. Then, R rdf:type dbpedia:populatedPlace. R3 rdf:type dbpedia:place. Dbpedia:areaTotal is suitable to describe R3 as a populatedPlace.

12 3 A Primitive Approach/3.2 Integrating By Class Hierarchy
Simplely: rdfs:subClassOf : partial order relation. Persps : partial ordered set. We want the minimal elements of this set. Integrate all triples in the Perspective of super classes of each minimal element to it. In real: Iterate each Perspective. Integrate all the triples from Perspectives the class of which is super class of this. Remove the Perspectives with empty triples and the Perspectives whose triples have been taken to integrate.

13 3 A Primitive Approach/3.3 Labeling Perspectives
The class namespace : the class label If the label is failed to get, the local name will be selected.

14 4 Improvements 4.1 Overview 4.2 Improvements by Empirical Predictions
4.3 A Progressive Approach 4.4 An Approach using External Resources

15 4 Improvements/4.1 Overview
Problem: Substantial numbers of triples with domain or range absent An Empirical Approach

16 4 Improvements/4.2 Improvements By An Empirical Approach
Suppose we already have all triples on the data of web. Count the frequencies of types whose instances are subject (object) of this property. Frequency Record:<Property, Class Type, Frequency> Generate two kinds of records: one for predicting domain and one for range. When a browsing request comes on the fly, for each property with domain or range absent, The predicted class type should be related to the current entity. The class type with higher frequency have more confidence to be one of the predicted one.

17 4 Improvements/4.2 Improvements By An Empirical Approach
Two previously set Threshold: MIN-FREQ: minimum frequency value allowed MAX-FREQ-INTERVAL-PERCENT: percentage of intervals in the max frequency value of a selected class type. Requirement in details: The class type of the record should be one of the existing Perspectives or the super class of at least one of them. All records are sorted in a descending order. frequency higher than MIN-FREQ. The highest ones which is satisfying the above requirement with the interval between two adjacent records lower than the threshold value is selected.

18 4 Improvements/4.3 A Progressive Approach
To implement the previous assume. Collect the frequency records progressively. Collect in the triples with domain or range absent during each browsing request.<Property, Type Class, Entity Uri> The triples with same subject(object) collected in different browsing requests contributes only one frequency. Obviously slow to improve obviously.

19 4 Improvements/4.4 An Approach Using External Resources
For experiments. Use data collected by Falcons Object Search. More than 4 billion triples from all over the world. Count the frequencies of properties with domain or range absent from triples of a subset of all entities.

20 5 Evaluation 5.1 The Effectiveness of the Improvements
5.2 Utility and Intelligibility Evaluation 5.3 Weakness and Possible Reasons

21 5 Evaluation/5.1 The Effectiveness of the Improvements
2,400 entities from LinkedMDB: 76.076%  % Superset of entities for the next experiments containing 30 entities: %  %

22 5 Evaluation/5.2 Utility and Intelligibility Evaluation
Five Criteria: The number of Perspectives should be proper. The label of Perspectives should be human friendly. Contents of one Perspective is related to it. Different Perspectives have a good discrimination. The whole approaches help users browse the information. Experiments : 12 participants and 9 randomly selected popular entities Browsing each Perspectives of each entity and tag the InfoItem which is not proper in its Perspective. Fill in a questionnaire with the five criteria as questions.

23 5 Evaluation/5.2 Utility and Intelligibility Evaluation
Average:

24 5 Evaluation/5.2 Utility and Intelligibility Evaluation
The discrimination is not good. The relevance of the contents and its belonged perspective is not bad.In average,86% of InfoItem is related to the perspective. The labels of Perspectives are not hard to understand but may not fulfill all users’ expectations.

25 5 Evaluation/5.3 Weakness and Possible Reasons
The low discrimination: Partly because of the strategies of our approaches Semantically related Perspectives don’t have a relation that can be retrieved from the schema. Source data quality. Some domains or ranges are semantically unrelated to the entity.

26 6 Conclusion & Future Work
Our approaches can help organize and navigate large scaled information while browsing an entity. Our experiments have illustrated its utility and intelligibility. Still need to improve the performances on discriminations between perspectives and coverage of our approaches.

27 Thank you!


Download ppt "Filtering Properties of Entities By Class"

Similar presentations


Ads by Google