Filtering Properties of Entities By Class

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Aki Hecht Seminar in Databases (236826) January 2009
Efficient Web Browsing on Handheld Devices Using Page and Form Summarization Orkut Buyukkokten, Oliver Kaljuvee, Hector Garcia-Molina, Andreas Paepcke.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
ICT TEACHERS` COMPETENCIES FOR THE KNOWLEDGE SOCIETY
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
Graph 8 a. Graph b. Domain _________ c. Range __________
Faculty of Informatics and Information Technologies Slovak University of Technology Personalized Navigation in the Semantic Web Michal Tvarožek Mentor:
Fundamentals of Information Systems, Fifth Edition
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Logics for Data and Knowledge Representation
A Z Approach in Validating ORA-SS Data Models Scott Uk-Jin Lee Jing Sun Gillian Dobbie Yuan Fang Li.
Organizing Data and Information AD660 – Databases, Security, and Web Technologies Marcus Goncalves Spring 2013.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Imports, MIREOT Contributors: Carlo Torniai, Melanie Courtot, Chris Mungall, Allen Xiang.
WebDAV Issues Munich IETF August 11, Property URL encoding At present, spec. allows encoding of the name of a property so it can be appended to.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Querying Structured Text in an XML Database By Xuemei Luo.
Dimitrios Skoutas Alkis Simitsis
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Lection №4 Development of the Relational Databases.
OWL Web Ontology Language Summary IHan HSIAO (Sharon)
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Advanced Higher Computing Science
COP Introduction to Database Structures
WP4 Models and Contents Quality Assessment
Charlie Abela Department of Intelligent Computer Systems
Introduction to the Semantic Web (tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
ece 627 intelligent web: ontology and beyond
Entity-Relationship Model
Tutorial on Semantic Web
Fundamentals of Information Systems, Sixth Edition
Data Mining K-means Algorithm
Haim Kaplan and Uri Zwick
Social Knowledge Mining
Chapter 18: Refining Analysis Relationships
ece 720 intelligent web: ontology and beyond
A Schema and Instance Based RDF Dataset Summarization Tool
Association Rule Mining
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
RDF 1.1 Concepts and Abstract Syntax
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
Introduction into Knowledge and information
ece 627 intelligent web: ontology and beyond
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
[jws13] Evaluation of instance matching tools: The experience of OAEI
Month Year doc.: IEEE yy/xxxxr0 July 2015
Magnet & /facet Zheng Liang
WebDAV Design Overview
Leverage Consensus Partition for Domain-Specific Entity Coreference
Chen Li Information and Computer Science
Facilitating Navigation on Linked Data through Top-K Link Patterns
Chapter 10 Structuring System Requirements: Conceptual Data Modeling
B-Trees.
A framework for ontology Learning FROM Big Data
Lecture 10 Structuring System Requirements: Conceptual Data Modeling
Presentation transcript:

Filtering Properties of Entities By Class Xiangqian Lee,Yuzhong Qu,Gong Cheng Websoft Research Group

Contents 1 Introduction 2 Problem Definition 3 A Primitive Approach 4 Improvements 5 Evaluation 6 Conclusion & Future Work

1 Introduction 1.1 Motivation

1 Introduction/1.1 Motivation RDF data graph Characteristics: 1)nature of interconnection, 2)ability to reuse Traditional Document-centric patterns doesn’t fit well Challenges exists A way users are used to read and understand Take full advantages of the characteristics of data

1 Introduction/1.1 Motivation Provide filtering properties Provide organization and navigation Filtering by human friendly labels We call it Perspective View the current entity in different perspectives Our Work: An automated approach Utilized for entity browsing Using Class hierarchies Adopting An Empirical Predictions to improve performance

2 Problem Definition Input: {E,Ftriples,Btriples} Output:{Persps, PLabels} E: set of co-referenced entity URIs FTriples: triples whose subject is element of E. Btriples: triples whose object is element of E. PLabels: one to one relationship to each element in Persps.

2 Problem Definition Goal: The number of Perspectives is proper. The label of each Perspective is human understandable. The contents under each Perspective are related to the label of it.

3 A Primitive Approach 3.1 Overview 3.2 Generate Primitive Perspectives 3.3 Integrating By Class Hierarchy 3.4 Labeling Perspectives

3 A Primitive Approach/3.1 Overview

3 A Primitive Approach/3.2 Generating Primitive Perspectives Process: Triple in FTriplesdomain class; triple in BTriplesrange class Equaivalent relation: The properties of two triples’s domain or range is the same class. Persps: get a quotient set treating the above relation as the equivalent relation. Add all explicitly stated types of any element in E to Perspectives. Multi-domain or multi-range: add triple to all corresponding Perspectives.

3 A Primitive Approach/3.2 Integrating By Class Hierarchy Fact: We have: <Shanghai dbpedia:areaTotal 6340500000>. < dbpedia:areaTotal rdfs:domain dbpedia:place>. < dbpedia:populatedPlace rdfs:subClassOf dbpedia:place >. Then, R3 rdf:type dbpedia:populatedPlace. R3 rdf:type dbpedia:place. Dbpedia:areaTotal is suitable to describe R3 as a populatedPlace.

3 A Primitive Approach/3.2 Integrating By Class Hierarchy Simplely: rdfs:subClassOf : partial order relation. Persps : partial ordered set. We want the minimal elements of this set. Integrate all triples in the Perspective of super classes of each minimal element to it. In real: Iterate each Perspective. Integrate all the triples from Perspectives the class of which is super class of this. Remove the Perspectives with empty triples and the Perspectives whose triples have been taken to integrate.

3 A Primitive Approach/3.3 Labeling Perspectives The class namespace : the class label If the label is failed to get, the local name will be selected.

4 Improvements 4.1 Overview 4.2 Improvements by Empirical Predictions 4.3 A Progressive Approach 4.4 An Approach using External Resources

4 Improvements/4.1 Overview Problem: Substantial numbers of triples with domain or range absent An Empirical Approach

4 Improvements/4.2 Improvements By An Empirical Approach Suppose we already have all triples on the data of web. Count the frequencies of types whose instances are subject (object) of this property. Frequency Record:<Property, Class Type, Frequency> Generate two kinds of records: one for predicting domain and one for range. When a browsing request comes on the fly, for each property with domain or range absent, The predicted class type should be related to the current entity. The class type with higher frequency have more confidence to be one of the predicted one.

4 Improvements/4.2 Improvements By An Empirical Approach Two previously set Threshold: MIN-FREQ: minimum frequency value allowed MAX-FREQ-INTERVAL-PERCENT: percentage of intervals in the max frequency value of a selected class type. Requirement in details: The class type of the record should be one of the existing Perspectives or the super class of at least one of them. All records are sorted in a descending order. frequency higher than MIN-FREQ. The highest ones which is satisfying the above requirement with the interval between two adjacent records lower than the threshold value is selected.

4 Improvements/4.3 A Progressive Approach To implement the previous assume. Collect the frequency records progressively. Collect in the triples with domain or range absent during each browsing request.<Property, Type Class, Entity Uri> The triples with same subject(object) collected in different browsing requests contributes only one frequency. Obviously slow to improve obviously.

4 Improvements/4.4 An Approach Using External Resources For experiments. Use data collected by Falcons Object Search. More than 4 billion triples from all over the world. Count the frequencies of properties with domain or range absent from triples of a subset of all entities.

5 Evaluation 5.1 The Effectiveness of the Improvements 5.2 Utility and Intelligibility Evaluation 5.3 Weakness and Possible Reasons

5 Evaluation/5.1 The Effectiveness of the Improvements 2,400 entities from LinkedMDB: 76.076%  24.1624% Superset of entities for the next experiments containing 30 entities: 84.7975201%  38.733062%

5 Evaluation/5.2 Utility and Intelligibility Evaluation Five Criteria: The number of Perspectives should be proper. The label of Perspectives should be human friendly. Contents of one Perspective is related to it. Different Perspectives have a good discrimination. The whole approaches help users browse the information. Experiments : 12 participants and 9 randomly selected popular entities Browsing each Perspectives of each entity and tag the InfoItem which is not proper in its Perspective. Fill in a questionnaire with the five criteria as questions.

5 Evaluation/5.2 Utility and Intelligibility Evaluation Average: 0.130677832

5 Evaluation/5.2 Utility and Intelligibility Evaluation The discrimination is not good. The relevance of the contents and its belonged perspective is not bad.In average,86% of InfoItem is related to the perspective. The labels of Perspectives are not hard to understand but may not fulfill all users’ expectations.

5 Evaluation/5.3 Weakness and Possible Reasons The low discrimination: Partly because of the strategies of our approaches Semantically related Perspectives don’t have a relation that can be retrieved from the schema. Source data quality. Some domains or ranges are semantically unrelated to the entity.

6 Conclusion & Future Work Our approaches can help organize and navigate large scaled information while browsing an entity. Our experiments have illustrated its utility and intelligibility. Still need to improve the performances on discriminations between perspectives and coverage of our approaches.

Thank you!