Ranking Relationships on the Semantic Web Budak Arpinar This work is funded by NSF-ITR-IDM Award#0325464 titled '‘SemDIS: Discovering Complex Relationships.

Slides:

Advertisements

Similar presentations

BAH DAML Tools XML To DAML Query Relevance Assessor DAML XSLT Adapter.

Advertisements

Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

An Ontological Approach to the Document Access Problem of Insider Threat ISI 2005, (May 20) Boanerges Aleman-Meza 1 Phillip Burns 2 Matthew Eavenson 1.

SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.

Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.

Filtering Information with Imprecise Social Criteria: A FOAF-based backlink model EUSFLAT 2005 (Barcelona) Elena García-Barriocanal

A Taxonomy-based Model for Expertise Extrapolation Delroy Cameron, Amit P. Sheth Ohio Center for Excellence in Knowledge-enabled Computing (Kno.e.sis)

Xyleme A Dynamic Warehouse for XML Data of the Web.

Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.

Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.

Computing Trust in Social Networks

Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya F. Noy and Mark A. Musen.

Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,

Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,

A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.

OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR

Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection Boanerges Aleman-Meza, Meenakshi Nagarajan,

CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.

An Ontological Approach to Assessing IC Need to Know Phillip BurnsCTA Inc. Prof. Amit ShethLSDIS Lab, University of Georgia Presented to ARDA PI Meeting,

Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.

Ranking Documents based on Relevance of Semantic Relationships Boanerges Aleman-Meza LSDIS labLSDIS lab, Computer Science, University of Georgia Advisor:

Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.

WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.

SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.

Peer-to-Peer Discovery of Semantic Associations Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Budak Arpinar, Amit Sheth 2 nd International.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

OntoQA: Metric-Based Ontology Quality Analysis Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, Boanerges Aleman-Meza IEEE Workshop on Knowledge.

Algorithmic Detection of Semantic Similarity WWW 2005.

Searching and Ranking Documents based on Semantic Relationships PaperPaper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.

Introduction to the Semantic Web and Linked Data

Context Aware Semantic Association Ranking SWDB Workshop Berlin, September 7, 2003 Boanerges Aleman-MezaBoanerges Aleman-Meza, Chris Halaschek, I. Budak.

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.

A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.

Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Karthikeyan Giriloganathan Boanerges Aleman-Meza LSDIS lab Computer Science University.

An Ontology-based Approach to Context Modeling and Reasoning in Pervasive Computing Dejene Ejigu, Marian Scuturici, Lionel Brunie Laboratoire INSA de Lyon,

Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

An Ontological Approach to Financial Analysis and Monitoring.

 -Queries: Enabling Querying for Semantic Associations on the Semantic Web WWW2003 (Budapest, May 23, 2003) Paper Presentation Kemafor Anyanwu Amit Sheth.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.

1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)

Ontology Evaluation and Ranking using OntoQA Samir Tartir and I. Budak Arpinar Large-Scale Distributed Information Systems Lab University of Georgia The.

Discovering and Ranking Semantic Associations over a Large RDF Metabase Chris Halaschek, Boanerges Aleman- Meza, I. Budak Arpinar, Amit P. Sheth 30th International.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

By: Chris Halaschek Advisors: Dr. I. Budak Arpinar Dr. Amit P. Sheth

Peer-to-Peer Discovery of Semantic Associations

Gong Cheng, Yanan Zhang, and Yuzhong Qu

ece 627 intelligent web: ontology and beyond

Data Pre-processing Lecture Notes for Chapter 2

CoXML: A Cooperative XML Query Answering System

Presentation transcript:

Ranking Relationships on the Semantic Web Budak Arpinar This work is funded by NSF-ITR-IDM Award# titled '‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award# titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’SemDIS: Discovering Complex Relationships in the Semantic Web

Outline Background Motivation Ranking Approach System Implementation Ranking Evaluation Document Ranking Conclusions and Future Work

The Semantic Web* An extension of the Web –Ontologies used to annotate the current information on the Web –RDF and OWL are the current W3C standard for metadata representation on the Semantic Web Allow machines to interpret the content on the Web in a more automated and efficient manner * BERNERS-LEE, T., HENDLER, J., AND LASSILA, O The Semantic Web. Scientific American, (May 2001)

Semantic Web Technology Evaluation Ontology (SWETO)^ Large scale test-bed ontology containing instances extracted from heterogeneous Web sources Developed using Semagix Freedom * –Created ontology within Freedom –Use extractors to extract knowledge and annotate with respect to the ontology ^Boanerges Aleman-Meza, Chris Halaschek, Amit Sheth, I. Budak Arpinar, and Gowtham Sannapareddy, SWETO: Large-Scale Semantic Web Test-bed, International Workshop on Ontology in Action, Banff, Canada, June 20-24, 2004Boanerges Aleman-MezaChris HalaschekAmit ShethI. Budak ArpinarGowtham SannapareddySWETO: Large-Scale Semantic Web Test-bed * Semagix Inc. Homepage:

SWETO - Statistics Covers various domains –CS publications, geographic locations, terrorism, etc. Version 1.4 includes over 800,000 entities and over 1,500,000 explicit relationships among them

SWETO Schema - Visualization

Semantic Associations* A B C 1. A is related to B by x.y.z x y z z’ y’ 2. A is related to C by i. x.y’.z’ u v ii. u.v (undirected path) 3. A is “related similarly” to B as it is to C (y’  y and z’  z  x.y.z  x.y’.z’) So are B and C related? ? Mechanisms for querying about and retrieving complex relationships between entities * ANYANWU, K., AND SHETH, A r-Queries: Enabling Querying for Semantic Associations on the Semantic Web. In Proceedings of the 12th International World Wide Web Conference (WWW-2003) (Budapest, Hungary, May ).ANYANWU, K.SHETH, A.

Semantic Connectivity Example &r1 &r5 &r6 worksFor “Chris” “Halaschek” fname lname Semantically Connected “LSDIS Lab” name “The University of Georgia” name associatedWith

&r4 &r8 &r2 Bank Account no FFlyer Payment paidby typeOf(instance) subClassOf(isA) subPropertyOf paidby “Bill” “Jones” &r7 &r1 fname lname purchased “John” “Smith” Ticket Flight forflight String number purchased Client fname lname String Passenger FFNo String Customer Cash fflierno creditedto fflierno creditedto paidby holder &r9 float amount purchased for &r5&r6 purchased for CCard fname lname “Jeff” “Brown ” fname lname “XYZ123” &r3 String ffid &r11 holder paidby holder RDF Instance Graph

ρ -path Problem Given: –An RDF instance graph G, vertices x and y in G, an integer k Find: –All simple, undirected paths p which connect x and y

Motivation Query between “Hubwoo [Company]” and “SONERI [Bank]” results in 1,160 associations Cannot expect users to sift through resulting associations Results must be presented to users in a relevant fashion…need ranking

Observations Ranking associations is inherently different from ranking documents –Sequence of complex relationships between entities in the metadata from multiple heterogeneous documents –No one way to measure relevance of associations Need a flexible, query dependant approach to relevantly rank the resulting associations

Ranking – Overview Define association rank as a function of several ranking criteria Two Categories: –Semantic – based on semantics provided by ontology Context Subsumption Trust –Statistical – based on statistical information from ontology, instances and associations Rarity Popularity Association Length

Context: What, Why, How? Context captures the users’ interest to provide them with the relevant knowledge within numerous relationships between the entities Context => Relevance; Reduction in computation space By defining regions (or sub-graphs) of the ontology

Context Specification Topographic approach –Regions capture user’s interest Region is a subset of classes (entities) and properties of an ontology –User can define multiple regions of interest Each region has a relevance weight

Context: Example e7:Terrorist Organization e4:Terrorist Organization e8:Terrorist Attack e6:Financial Organization e2:Financial Organization e 1 :Person e 9 :Location e 5 :Person friend Of member Of located In e3:Organization supports has Account located In works For member Of involved In at location Region 1 : Financial Domain, weight=0.25 Region 2 : Terrorist Domain, weight=0.75

Context Issues Issues –Associations can pass through numerous regions of interest –Large and/or small portions of associations can pass through these regions Associations outside context regions rank lower

Context Weight Formula Refer to the entities and relationships in an association generically as the components in the associations We define the following sets, note c R i is used for determining whether the type of c (rdf:type) belongs to context region R i : where n is the number of regions A passes through –X i is the set of components of A in the i th region –Z is the set of components of A not in any contextual region

Context Weight Formula C A = Define the Context weight of a given association A, C A, such that –n is the number of regions A passes through –length(A) is the number of components in the association –X i is the set of components of A in the i th region –Z is the set of components of A not in any contextual region

Subsumption Specialized instances are considered more relevant More “specific” relations convey more meaning Organization Political Organization Democratic Political Organization H. Dean Democratic Party member Of H. DeanAutoClub member Of Ranked Higher Ranked Lower

Subsumption Weight Formula Define the component subsumption weight (csw) of the i th component, c i, in an association A such that csw i = – is the position of component c i in hierarchy H –H height is the total height of the class/property hierarchy of the current branch Define the overall Subsumption weight of an association A as –length(A) is the number of components in A S A =

Trust Entities and relationships originate from differently trusted sources –Assign trust values depending on the source –e.g., Reuters could be more trusted than some of the other news sources Adopt the following intuition –The strength of an association is only as strong as its weakest link Trust weight of an association is the value of its least trustworthy component

Trust Weight Formula Let represent the component trust weight of the component, c i, in an association, A Define the Trust weight of an overall association A as T A =

Rarity Many relationships and entities of the same type (rdf:type) will exist Two viewpoints –Rarely occurring associations can be considered more interesting Imply uniqueness Adopted from [3] where rarity is used in data mining relational databases –Consider rare infrequently occurring relationship more interesting

Rarity Alternate viewpoint –Interested in associations that are frequently occurring (common) e.g., money laundering…often individuals engage in normal looking, common case transactions as to avoid detection User should determine which Rarity preference to use

Rarity Weight Formula Define the component rarity of the i th component, c i, in A as rar i such that rar i =, where (all instances and relationships in K), and –With the restriction that in the case res j and c i are both of type rdf:Property, the subject and object of c i and res j must be of the same rdf:type rar i captures the frequency of occurrence of the rdf:type of component c i, with respect to the entire knowledge- base

Rarity Weight Formula Define the overall Rarity weight, R, of an association, A, as a function of all the components in A, such that (a) R A = (b) R A = 1 – –where length(A) is the number of components in A –rar i is component rarity of the i th component in A To favor rare associations, (a) is used To favor more common associations (b) is used

Popularity Some entities have more incoming and outgoing relationships than others –View this as the Popularity of an entity Entities with high popularity can be thought of as hotspots Two viewpoints –Favor associations with popular entities –Favor unpopular associations

Popularity Favor popular associations –Ex. interested in the way two authors were related through co-authorship relations Associations which pass through highly cited (popular) authors may be more relevant Alternate viewpoint…rank popular associations lower –Entities of type ‘Country’ have an extremely high number of incoming and outgoing relationships Convey little information when querying for the way to persons are associated through geographic locations

Popularity Weight Formula Define the entity popularity, p i, of the i th entity, e i, in association A as p i = where –n is the total number of entities in the knowledge-base – is the set of incoming and outgoing relationships of e i – represents the size of the largest such set among all entities in the knowledge-base of the same class as e i p i captures the Popularity of e i, with respect to the most popular entity of its same rdf:type in the knowledge- base

Popularity Weight Formula Define the overall Popularity weight, P, of an association A, such that (a) P A = (b) P A = 1 – –where n is the number of entities (nodes) in A –p i is the entity popularity of the i th entity in A To favor popular associations, (a) is used To favor less popular associations (b) is used

Association Length Two viewpoints –Interest in more direct associations (i.e., shorter associations) May infer a stronger relationship between two entities –Interest in hidden, indirect, or discrete associations (i.e., longer associations) Terrorist cells are often hidden Money laundering involves deliberate innocuous looking transactions

Association Length Weight Define the Association Length weight, L, of an association A as (a) L A = (b) L A = 1 – –where length(A) is the number of components in the A To favor shorter associations, (a) is used, again To favor longer associations (b) is used

Overall Ranking Criterion Overall Association Rank of a Semantic Association is a linear function Ranking Score –where k i adds up to 1.0 Allows a flexible ranking criteria k 1 × Context + k 2 × Subsumption + k 3 × Trust + k 4 × Rarity + k 5 × Popularity + k 6 × Association Length =

System Implementation Ranking approach has been implemented within the LSDIS Lab’s SemDIS 2 and SAI 3 projects

System Implementation Native main memory data structures for interaction with RDF graph Naïve depth-first search algorithm for discovering Semantic Associations SWETO (subset) has been used for data set –Approximately 50,000 entities and 125,000 relationships SemDIS prototype 4, including ranking, is accessible through Web interface 4 SemDIS Prototype:

Ranking Configuration User is provided with a Web interface that gives her/him the ability to customize the ranking criteria Use a modified version of TouchGraph 5 to define the query context –A Java applet for the visual interaction with a graph 5 TouchGraph Homepage:

Context Specification Interface

Ranking Configuration Interface

Ranking Module Java implementation of the ranking approach Unranked associations are traversed and ranked according to the ranking criteria defined by the user Ranking is decomposed into finding the context, subsumption, trust, rarity, and popularity rank of all entities in each association

Ranking Module Context, subsumption, trust, and rarity ranks of each relationship are found during the traversal as well –When the RDF data is parsed, rarity, popularity, trust, and subsumption statistics of both entities and relationships are maintained –Finding the context rank consists of checking which context regions, if any, each entity or relationship in each association belongs to

Ranked Results Interface

Ranking Evaluation Evaluation metrics such as precision and recall do not accurately measure the ranking approach Used a panel of five human subjects for evaluation –Due to the various ways to interpret associations

Ranking Evaluation Evaluation process –Subjects given randomly sorted results from different queries each consisting of approximately 50 results –Provided subjects with the ranking criteria for each query i.e., context, whether to favor short/long, rare/common associations, etc. –Provided type(s) of the components in the associations To measure context relevance –Subjects ranked the associations based on this modeled interest and emphasized criterion

Ranking Evaluation (1)

Ranking Evaluation (2) Average distance of system rank from that given by subjects –Based on relative order

Conclusions Defined a flexible, query dependant approach to relevantly rank Semantic Association query results Presented a prototype implementation of the ranking approach Empirically evaluated the ranking scheme –Found that our proposed approach is able to capture the user’s interest and rank results in a relevant fashion

Future Work ‘Ranking-on-the-Fly’ –Ranks can be assigned to associations as the algorithm is traversing them Possible performance improvements Use of the ranking scheme for the Semantic Association discovery algorithms (scalability in very large data sets) –Utilize context to guide the depth-first search –Associations that fall below a predetermined minimal rank could be discarded Additional work on context specification Develop ranking metrics for Semantic Similarity Associations

Publications [1] Chris Halaschek, Boanerges Aleman-Meza, I. Budak Arpinar, Cartic Ramakrishnan, and Amit Sheth, A Flexible Approach for Analyzing and Ranking Complex Relationships on the Semantic Web, Third International Semantic Web Conference, Hiroshima, Japan, November 7-11, 2004 (submitted)Chris HalaschekBoanerges Aleman-MezaI. Budak ArpinarCartic RamakrishnanAmit ShethA Flexible Approach for Analyzing and Ranking Complex Relationships on the Semantic Web [2] Chris Halaschek, Boanerges Aleman-Meza, I. Budak Arpinar, and Amit Sheth, Discovering and Ranking Semantic Associations over a Large RDF Metabase, 30th Int. Conf. on Very Large Data Bases, August 30 September 03, 2004, Toronto, Canada. Demonstration PaperChris HalaschekBoanerges Aleman-MezaI. Budak Arpinar Amit ShethDiscovering and Ranking Semantic Associations over a Large RDF Metabase [3] Boanerges Aleman-Meza, Chris Halaschek, Amit Sheth, I. Budak Arpinar, and Gowtham Sannapareddy, SWETO: Large-Scale Semantic Web Test-bed, International Workshop on Ontology in Action, Banff, Canada, June 20-24, 2004Boanerges Aleman-MezaChris HalaschekAmit ShethI. Budak ArpinarGowtham SannapareddySWETO: Large-Scale Semantic Web Test-bed [4] Boanerges Aleman-Meza, Chris Halaschek, I. Budak Arpinar, and Amit Sheth, Context-Aware Semantic Association Ranking, First International Workshop on Semantic Web and Databases, Berlin, Germany, September 7-8, 2003; pp Boanerges Aleman-MezaChris HalaschekI. Budak Arpinar Amit ShethContext-Aware Semantic Association Ranking

References [1] [2] BERNERS-LEE, T., HENDLER, J., AND LASSILA, O The Semantic Web. Scientific American, (May 2001) [3] LIN, S., AND CHALUPSKY, H Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis. The Third IEEE International Conference on Data Mining.